Why Build a Cmp?

نویسندگان

  • Lance Hammond
  • Benedict A. Hubbert
  • Michael Siu
  • Manohar K. Prabhu
  • Michael Chen
  • Kunle Olukotun
چکیده

The Hydra chip multiprocessor (CMP) integrates four MIPS-based processors and their primary caches on a single chip together with a shared secondary cache. A standard CMP offers implementation and performance advantages compared to wide-issue superscalar designs. However, it must be programmed with a more complicated parallel programming model to obtain maximum performance. To simplify parallel programming, the Hydra CMP supports thread-level speculation and memory renaming, a paradigm that allows performance similar to a uniprocessor of comparable die area on integer programs. This article motivates the design of a CMP, describes the architecture of the Hydra design with a focus on its speculative thread support, and describes our prototype implementation. As Moore's law allows increasing numbers of smaller and faster transistors to be integrated on a single chip, new processors are being designed to use these transistors effectively to improve performance. Today, most microprocessor designers use the increased transistor budgets to build larger and more complex uniprocessors. However, several problems are beginning to make this approach to microprocessor design difficult to continue. To address these problems, we have proposed that future processor design methodology shift from simply making progressively larger uniprocessors to implementing more than one processor on each chip. 1 The following discusses the key reasons why single-chip microprocessors are a good idea. Parallelism Designers primarly use additional transistors on chips to extract more parallelism from programs to perform more work per clock cycle. While some transistors are used to build wider or more specialized data path logic (to switch from 32 to 64 bits or add special mul-timedia instructions, for example), most are used to build superscalar processors. These processors can extract greater amounts of instruction-level parallelism, or ILP, by finding nondependent instructions that occur near each other in the original program code. Unfortunately, there is only a finite amount of ILP present in any particular sequence of instructions that the processor executes because instructions from the same sequence are typically highly interdependent. As a result, processors that use this technique are seeing diminishing returns as they attempt to execute more instructions per clock cycle, even as the logic required to process multiple instructions per clock cycle increases qua-dratically. A CMP avoids this limitation by primarily using a completely different type of parallelism: thread-level parallelism. We obtain TLP by running completely separate sequences of instructions on each of the separate processors simultaneously. Of course, a CMP may also exploit …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Impact of Data Sharing on CMP design: A study based on Analytical Modeling

In this work we study the effect data and instruction sharing on cache miss rates. We then extend an analytical system-level throughput model to take multi-threaded data and instruction sharing into account. We use the model to provide insights into the interaction of thread count, cache size, off-chip bandwidth, and, sharing, on system throughput. Using specific examples we teach how the model...

متن کامل

WHY AND HOW TO APPLY QUANTUM LEARNING AS A NEW APPROACH TO IMPLEMENTATION THE CURRICULUM

The present study was philosophical and analytical research that examines quantum learning as an effective approach to the curriculum in a qualitative way. It explored books, published essays, and related studies, and took some advantages of online materials on the issue from domestic and foreign sources. Because of large body of data on the issue, only the relevant information was included. Da...

متن کامل

CMP$im: A Binary Instrumentation Approach to Modeling Memory Behavior of Workloads on CMPs CMP$im: A Binary Instrumentation Approach to Modeling Memory Behavior of Workloads on CMPs

Chip multiprocessors are the next attractive point in the design space of future high performance processors. There is a growing need for simulation methodologies to determine the memory system requirements of emerging workloads in a reasonable amount of time. To explore the design space of a CMP memory hierarchy, this paper presents the use of binary instrumentation as an alternative to execut...

متن کامل

Random or predictable?: Adoption patterns of chronic care management practices in physician organizations

BACKGROUND Theories, models, and frameworks used by implementation science, including Diffusion of Innovations, tend to focus on the adoption of one innovation, when often organizations may be facing multiple simultaneous adoption decisions. For instance, despite evidence that care management practices (CMPs) are helpful in managing chronic illness, there is still uneven adoption by physician o...

متن کامل

Single-Path Programming on a Chip-Multiprocessor System

In this paper we explore a time-predictable chip-multiprocessor (CMP) system based on single-path programming. To keep the timing constant, even in the case of shared memory access for the CMP cores, the tasks on the cores are synchronized with the time-sliced memory arbitration unit. 1 The Single-Path CMP System The main goal of our approach is to build an architecture that provides a combinat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007